Speaker conversion through non-linear frequency warping of straight spectrum
نویسندگان
چکیده
A parametric conversion of speech individuality is proposed based on STRAIGHT speech representation. STRAIGHT speech analysis-synthesis can produce high quality speech for various kinds of transformations by using 1) pitch synchronous windowing, 2) time-frequency spectrum interpolating and 3) randomized all-pass filtering for shaping phase spectrum. In order to utilize the smoothness of STRAIGHT spectrum, speech conversion is accomplished by warping the frequency axis. The warping functions are trained for each class of the predetermined spectrum shape grouping. The evaluation test is performed to compare the proposed method and VQ prototype mapping or linear transformation of cepstrum vectors. As a measure of converted speech quality, the MOS score of 6 subjects is calculated and is found to be better than conventional methods by about 1.5 point without degrading the accuracy of speech individuality discrimination.
منابع مشابه
تخمین سریع ضرایب پیچش در هنجارسازی طول مجرای صوتی با استفاده از امتیاز به دست آمده از مدلسازی تشخیص جنسیت
The performance of automatic speech recognition (ASR) systems is adversely affected by the variations in speakers, audio channels and environmental conditions. Making these systems robust to these variations is still a big challenge. One of the main sources of variations in the speakers is the differences between their Vocal Tract Length (VTL). Vocal Tract Length Normalization (VTLN) is an effe...
متن کاملHigh quality voice conversion based on Gaussian mixture model with dynamic frequency warping
In the voice conversion algorithm based on the Gaussian Mixture Model (GMM), quality of the converted speech is degraded because the converted spectrum is exceedingly smoothed. In this paper, we newly propose the GMM-based algorithm with the Dynamic Frequency Warping (DFW) to avoid the over-smoothing. We also propose that the converted spectrum is calculated by mixing the GMM-based converted sp...
متن کاملFrequency Warping for Speaker Adaptation in HMM-based Speech Synthesis
Speaker adaptation in speech synthesis transforms a source utterance to a target utterance that differs from the source in terms of voice characteristics. In this paper, we employ vocal tract length normalization, which is generally used in speech recognition to remove individual speaker characteristics, to speaker adaptation in speech synthesis. We propose a frequency warping approach based on...
متن کاملSimplification and extension of non-periodic excitation source representations for high-quality speech manipulation systems
A systematic framework for non-periodic excitation source representation is proposed for high-quality speech manipulation systems such as TANDEM-STRAIGHT, which is basically a channel VOCODER. The proposed method consists of two subsystems for non-periodic components; a colored noise source and an event analyzer/generator. The colored noise source is represented by using a sigmoid model with no...
متن کاملNormalization of speaker variability by spectrum warping for robust speech recognition
This paper examines techniques for normalization of unseen speakers in recognition. Two implementations of linear spectrum warping were examined: time domain resampling and filter bank scaling. It is shown that for seen speakers, the models trained by unwarped utterances are less sensitive to spectrum warping by filter bank scaling than by resampling. A pitch-based scheme for warping factor est...
متن کامل